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Abstract 

We consider transmission of stationary and ergodic sources over non-ergodic composite channels 
with channel state information at the receiver (CSIR). Previously we introduced alternate capacity 
definitions to Shannon capacity, including the capacity versus outage and the expected capacity. These 
generalized definitions relax the constraint of Shannon capacity that all transmitted information must be 
decoded at the receiver. In this work alternate end-to-end distortion metrics such as the distortion versus 
outage and the expected distortion are introduced to relax the constraint that a single distortion level has 
to be maintained for all channel states. For transmission of stationary and ergodic sources over stationary 
and ergodic channels, the classical Shannon separation theorem enables separate design of source and 
channel codes and guarantees optimal performance. For generalized communication systems, we show 
that different end-to-end distortion metrics lead to different conclusions about separation optimality even 
for the same source and channel models. 

Separation does not imply isolation - the source and channel still need to communicate with 
each other through some interfaces. For Shannon separation schemes, the interface is a single-number 
comparison between the source coding rate and the channel capacity. Here we include a broader class 
of transmission schemes as separation schemes by relaxing the constraint of a single-number interface. 
We show that one such generalized scheme guarantees the separation optimality under the distortion 
versus outage metric. Under the expected distortion metric, separation schemes are no longer optimal. 
We expect a performance enhancement when the source and channel coders exchange more information 
through more sophisticated interfaces, and illustrate the tradeoff between interface complexity and end- 
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to-end performance through the example of transmitting a binary symmetric source over a composite 
binary symmetric channel. 
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Source-Channel Coding and Separation for 
Generalized Communication Systems 

I. Introduction 

The time-varying nature of the underlying channel is one of the most significant design 
challenges in wireless communication systems. In particular, real-time media traffic typically has 
a stringent delay constraint, so the exploitation of long blocklength frames is infeasible and the 
entire frame may fall into deep fading channel states. Furthermore, the receiver may have limited 
resources to feed the estimated channel state information back to the transmitter, which precludes 
adaptive transmission and forces the transmitter to use a stationary coding strategy. The above 
described situation is modeled as a slowly fading channel with receiver side information only, 
which is an example of a non-ergodic composite channel. A composite channel is a collection of 
component channels {Ws : S G S} parameterized by S, where the random variable S is chosen 
according to some distribution p(S) at the beginning of transmission and then held fixed. We 
assume the channel realization is revealed to the receiver but not the transmitter. This class of 
channel is also referred to as the mixed channel [1] or the averaged channel [2] in literature. 

The Shannon capacity of a composite channel is given by the Verdu-Han generalized capacity 
formula [3] 

C = sup I(X;Y), 
x 

where /(-X"; Y) is the liminf in probability of the normalized information density. This formula 
highlights the pessimistic nature of the Shannon capacity definition, which is dominated by 
the performance of the "worst" channel, no matter how small its probability. To provide more 
flexibility in capacity definitions for composite channels, in [4], [5] we relax the constraint that 
all transmitted information has to be correctly decoded and derive alternate definitions including 
the capacity versus outage and the expected capacity. The capacity versus outage approach 
allows certain data loss in some channel states in exchange for higher rates in other states. It 
was previously examined in [6] for single- antenna cellular systems, and later became a common 
criterion for multiple- antenna wireless fading channels [7]-[9]. See [10, Ch. 4] and references 
therein for more details. The expected capacity approach also requires the transmitter to use a 
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single encoder but allows the receiver to choose from a collection of decoders based on channel 
states. It was derived for a Gaussian slow-fading channel in [11], and for a composite binary 
symmetric channel (BSC) in [12]. 

Channel capacity theorems deal with data transmission in a communication system. When 
extending the system to include the source of the data, we also need to consider the data 
compression problem which deals with source representation and reconstruction. For the overall 
system, the end-to-end distortion is a well-accepted performance metric. When both the source 
and channel are stationary and ergodic, codes are usually designed to achieve the same end-to-end 
distortion level for any source sequence and channel realization. Nevertheless, practical systems 
do not always impose this constraint. If the channel model is generalized to such scenarios as 
the composite channel above, it is natural to relax the constraint that a single distortion level has 
to be maintained for all channel states. In parallel with the development of alternative capacity 
definitions, we introduce generalized end-to-end distortion metrics including the distortion versus 
outage and the expected distortion. The distortion versus outage is characterized by a pair (q, D q ), 
where the distortion level D q is guaranteed in receiver-recognized non-outage states of probability 
no less than (1 — q). This definition requires CSIR based on which the outage can be declared. 
The expected distortion is defined as E, S D S , i.e. the achievable distortion D s in channel state 
S averaged over the underlying distribution p(S). These alternative distortion metrics are also 
considered in prior works. In [13] the average distortion qa 2 + (1 — q)D q , obtained by averaging 
over outage and non-outage states, was adopted as a fidelity criterion to analyze a two-hop fading 
channel. Here a 2 is the variance of the source symbols. The expected distortion was analyzed for 
the MIMO block fading channel in the high SNR regime [14] and in the finite SNR regime [15], 
[16]. Various coding schemes for expected distortion were also studied in a slightly different but 
closely related broadcast scenario [17]— [19]. 

Data compression (source coding) and data transmission (channel coding) are two fundamental 
topics in Shannon theory. For transmission of a discrete memoryless source (DMS) over a discrete 
memoryless channel (DMC), the renowned source-channel separation theorem [20, Theorem 
2.4] asserts that a target distortion level D is achievable if and only if the channel capacity C 
exceeds the source rate distortion function R(D), and a two-stage separate source-channel code 
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suffices to meet the requirement. This theorem enables separate designs of source and channel 
codes with guaranteed optimal performance. It also extends to stationary and ergodic source and 
channel models [22] [23]. Separate source-channel coding schemes provide flexibility through 
modularized design. From the source's point of view, the source can be transmitted over any 
channel with capacity greater than R(D) and be recovered at the receiver subject to a certain 
fidelity criterion (the distortion D). The source is indifferent to the statistics of each individual 
channel and consequently focuses on source code design independent of channel statistics. 

Despite their flexibility and optimality for certain systems, separation schemes also have their 
disadvantages. First of all, the source encoder needs to observe a long-blocklength source 
sequence in order to determine the output, which causes infinite delay. Second, separation 
schemes may increase complexity in encoders and decoders because the two processes of source 
and channel coding are acting in opposition to some extent. Source coding is essentially a data 
compression process, which aims at removing redundancy from source sequences to achieve the 
most concise representation. On the other hand, channel coding deals with data transmission, 
which tries to add some redundancy to the transmitted sequence for robustness against the channel 
noise. If the source redundancy can be exploited by the channel code, then a joint source-channel 
coding scheme may avoid this overhead. In particular, transmission of a Gaussian source over a 
Gaussian channel, and a binary symmetric source over a BSC, are both examples where optimal 
performance can be achieved without any coding [24]. This is because the source and channel 
are "matched" to each other in the sense that the transition probabilities of the channel solve the 
variational problem defining the source rate-distortion function R(D) and the letter probabilities 
of the source drive the channel at capacity [25, p.74]. 

A careful inspection of the Shannon separation theorem reveals some important underlying 
assumptions: a single-user channel, a stationary and ergodic source and channel, and a single 
distortion level maintained for all transmissions. Violation of any of these assumptions will 
likely prompt reexamination of the separation theorem. For example, Cover et. al. showed that 
for a multiple access channel with correlated sources, the separation theorem fails [26]. In [27] 
Vembu et al. gave an example of a non-stationary system where the source is transmissible 
through the channel with zero error, yet its minimum achievable source coding rate is twice the 

'The separation theorem for lossless transmission [21] can be regarded as a special case of zero distortion. 
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channel capacity. In this work, we illustrate that different end-to-end distortion metrics lead to 
different conclusions about separability even for the same source and channel model. In fact, 
source-channel separation holds under the distortion versus outage metric but fails under the 
expected distortion metric. In [28] we proved the direct part of source-channel separation under 
the distortion versus outage metric and established the converse for a system of Gaussian source 
and slow-fading Gaussian channels. Here we extend the converse to more general systems of 
stationary sources and composite channels. 

Source-channel separation implies that the operation of source and channel coding does 
not depend on the statistics of the counterpart. However, the source and channel do need to 
communicate with each other through a negotiation interface even before the actual transmission 
starts. In the classical view of Shannon separation for stationary ergodic sources and channels, 
the source requires a rate R(D) based on the target distortion D and the channel decides if it can 
support the rate based on its capacity C. For generalized source/channel models and distortion 
metrics, the interface is not necessarily a single rate and may allow multiple parameters to 
be agreed upon between the source and channel. After communication through the appropriate 
negotiation interface, the source and channel codes may be designed separately and still achieve 
the optimal performance. Vembu et al. studied the transmission of non- stationary sources over 
non-stationary channels and observed that the notion of (strict) domination [27, Theorem 7] 
dictates whether a source is transmissible over a channel, instead of the simple comparison 
between the minimum source coding rate and the channel capacity. The notion of (strict) 
domination requires the source to provide the distribution of the entropy density and the channel 
to provide the distribution of the information density as the appropriate interface. 

The source-channel interface concept also applies after the actual transmission starts. At the 
transmitter end, we see examples where the source sequence is directly supplied to the channel, 
such as the uncoded transmission of a Gaussian source over a Gaussian channel. But more 
generally there is certain processing on the source side, and the processed output, instead of the 
original source sequence, is supplied to the channel. The transmitter interface contains what the 
source actually delivers to the channel. For example, in separation schemes the interface is the 
source encoder output; in hybrid digital-analog schemes [19] the interface is a combination of 
vector quantizer output and quantization residue. Similarly we can introduce the concept of a 
receiver interface. Instead of directly delivering the channel output sequence to the destination, 
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the receiver may implement certain decoding and choose the channel decoder output as the 
interface. The interfaces at the transmitter and the receiver are the same in classical Shannon 
separation schemes, since the channel code requires all transmitted information to be correctly 
decoded with vanishing error, but in general the two interfaces can be different. For example, 
the receiver interface may include an outage indicator or partial decoding when considering 
generalized capacity definitions. 

Different transmission schemes can be compared by their end-to-end performance. Neverthe- 
less, the concept of source-channel interface opens a new dimension for comparison. Ideally 
the interface complexity should be measured by some quantified metrics. Transmission schemes 
with low interface complexity are also appealing in view of simplified system design. We expect 
a performance enhancement when the source and channel exchange more information through 
a more sophisticated interface, and illustrate the tradeoff between interface complexity and end- 
to-end performance through some examples in this work. 

The rest of the paper is organized as follows. We review alternative channel capacity definitions 
and define corresponding end-to-end distortion metrics in Section HH In Section [III] we provide 
a new perspective of source-channel separation generalized from Shannon's classical view and 
also introduce the concept of source-channel interface. In Section [IV] we establish the separation 
optimality for transmission of stationary ergodic sources over composite channels under the 
distortion versus outage metric. In Section [V] we consider various schemes to transmit a binary 
symmetric source (BSS) over a composite BSC and show the tradeoff between achievable 
expected distortion and interface complexity. Conclusions are given in Section [VT] 

II. Generalized Performance Metrics 

We first review alternate channel capacity definitions derived in [4], [12] to provide some 
background information. We then define alternate end-to-end performance metrics for the entire 
communication system, including the source and the destination. 

A. Background: Channel Capacity Metrics 

The channel W is statistically modeled as a sequence of ra-dimensional conditional distribu- 
tions W = {W n = Pz n \x n }^=i- For any integer n, W n is the conditional distribution from the 
input space X n to the output space Z n . Let X and Z denote the input and output processes, 
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respectively. Each process is specified by a sequence of finite-dimensional distributions, e.g. 

x = {x n = {x i C\... ,4 n) )}- =1 . 

In a composite channel, when the channel side information is available at the receiver, we 
represent it as an additional channel output. Specifically, we let Z n = (S,Y n ), where S is 
the channel side information and Y n is the output of the channel described by parameter S. 
Throughout, we assume the random variable S is independent of X and unknown to the encoder. 
Thus for each n 

P wn (z n \x n ) = P znlxn (s,y n \x n ) 

= P s (s)P Y n lx ^ s (y n \x n ,s). (1) 
The information density is defined similarly as in [3] 

P W n(z n \x n ) 



ix n w n {x n \z n ) = log 
= log 



PY^\x^,s(y n \x n ,s) 



PY*\s(y n \s) 

= i X "W"(x n ;y n \s). (2) 
1 ) Capacity versus Outage: Consider a sequence of (n, 2 nR ) codes. Let P^ be the probability 

in) 

that the receiver declares an outage, and P e be the decoding error probability given that no 
outage is declared. We say that a rate R is outage-g achievable if there exists a sequence of 
(n, 2 nR ) channel codes such that lim P^ < q and lim Pj™ = 0. The capacity versus outage 

n— >oo n— >oo 

C q is defined to be the supremum over all outage-g achievable rates, and is shown to be [3], [4] 

"1 



C q = sup sup <{ a : lim Pr 
x 



■i(X n -Y n \S) < a 
n 



<q\- 0) 



The operational implication of this definition is that the encoder uses a single codebook and 
sends information at a fixed rate C q . Assuming repeated channel use and independent channel 
state at each use, the receiver can correctly decode the information a proportion (1 — q) of 
the time and turn itself off a proportion q of the time. We further define the outage capacity 
C° = (1 — q)C q as the long-term average rate, which is a meaningful metric if we are only 
interested in the fraction of correctly received packets and approximate the unreliable packets 
by surrounding samples, or if there is some repetition mechanism where the receiver requests 
retransmission of lost information from the sender. The value q can be chosen to maximize the 
long-term average throughput C°. 
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2) Expected Capacity: This notion provides another strategy for increasing reliably -received 
rate. Although the transmitter is forced to use a single encoder at a rate R t without channel state 
information, the receiver can choose from a collection of decoders, each parameterized by s and 
decoding at a rate R s < R t , based on CSIR. Denote by P^ n,s ' the error probability associated 
with channel state s. The expected capacity C e is the supremum of all achievable rates E 5 _R S 
of any code sequence that has E^Pi"' 5 '* approaching zero. 

In a composite channel, different channel states can be viewed as virtual receivers, and 
therefore the expected capacity is closely related to the capacity region of a broadcast channel 
(BC). In the broadcast system the channel from the input to the output of receiver s is 



Under certain conditions, it is shown that the expected capacity of a composite channel equals to 
the maximum weighted sum-rate over the capacity region of the corresponding broadcast channel, 
where the weight coefficient is the state probability P(s) [5, Theorem 1]. Using broadcast channel 
codes, the expected capacity is derived in [11] for a Gaussian slow-fading channel and in [12] 
for a composite BSC. 

The expected capacity is a meaningful metric if partial received information is useful. For 
example, consider sending an image using a multi-resolution (MR) source code over a composite 
channel. Decoding all transmitted information leads to reconstructions with the highest fidelity. 
However, in the case of inferior channel quality, it still helps to decode partial information and 
get a coarse reconstruction. 

B. End-to -End Distortion Metrics 

Next we introduce alternative end-to-end distortion metrics as performance measures for 
transmission of a stationary ergodic source over a composite channel. We denote by V the source 
alphabet and the source symbols {V n = (v} n) , vf\ • • • , K (n) )}^ = l are generated according to 
a sequence of finite-dimensional distributions P(V n ), and then transmitted over a composite 
channel W n : X n — > (Y n , S) with conditional output distribution 



It is possible that the source generates symbols at a rate different from the rate at which the 
channel transmits symbols, i.e. a length-n source sequence may be transmitted in m channel uses 



P Y ™\X n {y™\x n ) — PY«\X n ,S 



W n {y n ,s\x n )=P s (s)P Ynlx ^ s 



(y n \x n ,s). 
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with m 7^ n. The channel bandwidth expansion ratio is defined to be b = m/n. For simplicity 
we assume b = 1 in this and the next two sections, but the discussions can be easily extended 
to general cases with 6^1. The numerical examples in Section |V] will explicitly address this 
issue. 

1 ) Distortion versus Outage: Here we design an encoder /„ : V™ — > X n that maps the source 
sequence to the channel input. Note that the source and channel encoders, whether joint or 
separate, do not have access to channel state information S. However, the receiver can declare 
an outage with probability Pj™^ based on CSIR. In non-outage states, we design a decoder 
<p n : (Y n , S) — > V n that maps the channel output to a source reconstruction. We say a distortion 
level D is outage-g achievable if lim P^ < q and 

n— >oo 

lim Pr { (V n , V n ) : d(V n , V n ) > D no outage) = 0, (4) 

where d(V n , V n ) = - XT=i ^(^i> is the distortion measure between the source sequence V n 
and its reconstruction V n . The distortion versus outage D q is the infimum over all outage-g 
achievable distortions. In order to evaluate © we need the conditional distribution P(V n \V n ). 
Assuming the encoder f n and the decoder <p n are deterministic, this distribution is given by 

W n (Y n , S\X n ) ■ 1 [X n = f n (V n ), V n = MY n , S)] (5) 

(X«,Y n ,S) 

Here 1{-} is the indicator function. Note that the channel statistics W n and the source statistics 
P(V n ) are fixed, so the code design is essentially the appropriate choice of the outage states 
and the encoder-decoder pair (f n ,(f> n )- 

2) Expected Distortion: We denote by Ds the achievable average distortion when the channel 
is in state S, and it is given by 

D s = lim V P{V n )W n (Y n \X n , S)d{V n , V n ), (6) 

where the summation is over all (V n , X n , Y n , V n ) such that X n = f n {V n ) and V n = (j) n (Y n , S). 
Notice that the transmitter cannot access channel state information so the encoder f n is inde- 
pendent of S; nevertheless the receiver can choose different decoders (f> n (-, S) based on CSIR. 

In a composite channel, each channel state is assumed to be stationary and ergodic, so for 
a fixed channel state S we can design source-channel codes such that d(V n , V n ) approaches a 
constant limit D s for large n; however, it is possible that d(V n , V n ) approaches different limits 
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for different channel states. The expected distortion metric captures the distortion averaged over 
various channel states. Using the conditional distribution P(V n \V n ) in © and the definition of 
D s in ©, the average distortion can be written 



Jim E (v „ y n) [d(V n ,V n )} = J2P(S)D S = E S D S . (7) 
The expected distortion D e is the infimum of all achievable average distortions 

III. Source-Channel Separation and Interface: A New Perspective 

For transmission of a source over a channel, the system consists of three concatenated blocks: 
the encoder f n that maps the source sequence V n to the channel input X n ; the channel W n that 
maps the channel input X n to channel output Z n , and the decoder <p n that maps the channel 
output Z n to a reconstruction of the source sequence V n . In contrast, a separate source-channel 
coding scheme consists of five blocks. The encoder f n is separated into a source encoder 

f~ n :V n -+M n , t = {l,2,--- X Rt } 

and a channel encoder 

f n : M n , t = {1,2,- ■■ ,2 nR *}^X n , 

where the index set A4 n j of size 2 nRt serves as both the source encoder output and the channel 
encoder input. Equivalently, each index in M. n ,t can be viewed as a block of nR t bits [5, Defn. 
5]. The decoder <p n is also separated into a channel decoder <p n and a source decoder <p n . The 
difference between a general system and a separate source-channel coding system is summarized 
in Fig. [Q 

Separation does not imply isolation - the source and channel encoders and decoders still 
need to agree on certain aspects of their respective designs. There are three interfaces through 
which they exchange information, the negotiation interface, the transmitter interface and the 
receiver interface. For classical Shannon separation schemes with an end-to-end distortion target 
D, these interfaces are summarized in Table HI The negotiation interface is a single rate compar- 
ison between R(D) and C. Since the Shannon capacity definition requires that all transmitted 

2 Assuming a bounded distortion measure, the exchange of limit operation and expectation follows from the dominant 
convergence theorem. 



February 26, 2009 



DRAFT 



1 1 



yn 



Encoder f n 



x r 



Channel W n 



Decoder < 



Source 




Channel 




Channel 




Channel 




Source 


ENC fn 




ENC fn 




w n 




DEC 0" 




DEC (j) n 



yn 
yn 



Fig. 1. Upper: general communication system with three blocks. Lower: separate source-channel coding system with five 
blocks. 



information be correctly decoded, the transmission rate R t is the same as the receiving rate 
R r . Assuming stationary and ergodic systems, these rates do not depend on the blocklength n. 
However, these constraints can be relaxed to include more source-channel transmission strategies 
as separation schemes. 

TABLE I 

Interface for Shannon separation schemes 



Negotiation 


source coding 


rate R(D) and channel Shannon 




capacity C 




Transmitter 


Mn.t = {1,2, 


•■ ,2 nRt } 


Receiver 


Mn,r = {1,2, 


... 2 nR -} 



In [27] Vembu et al. proposed transmission schemes for non-stationary source and channel 
models. The corresponding interfaces are listed in Table HD Here the negotiation interface is no 
longer a single number, but a sequence of source and channel statistics for different blocklengths 
n. The transmission and receiving rates are still the same, but now they depend on the blocklength 
n. 

TABLE II 

Interface for Vembu separation schemes 



Negotiation 


source entropy density hv™(v") and channel infor- 




mation density ix™ w™ ( x "', z n ) 


Transmitter 


M n , t = {1,2,.-- ,2 ?lc "} 


Receiver 


Mn,r = {l,2,--- ,2 nC "} 



February 26, 2009 



DRAFT 



12 



In Section [IV] we propose a separation scheme for transmission of stationary ergodic sources 
over composite channels, and prove its optimality under distortion versus outage metrics. The 
interfaces of this scheme are shown in Table Hill The negotiation interface is still a single 
number, but the channel should provide its capacity versus outage-g (C q ) [5, Defn. 3] instead 
of the Shannon capacity. The receiver interface includes an additional outage indicator. In non- 
outage states, the channel decoder recovers the channel input index with negligible error and 
delivers it to the source decoder to achieve the end-to-end distortion target D q . In outage states 
the channel decoder shuts itself off and nothing passes through the receiver interface. 

TABLE III 

Interface under distortion versus outage metric 



Negotiation 


source coding rate R(D q ) and channel capacity 
versus outage-q (C q ) 


Transmitter 


M n ,t ={1,2,-- - ,2 nR } 


Receiver 


Outage indicator I. For non-outage states M n ,r = 

Mn,t 



In Section |V] we study transmission of a binary symmetric source over a composite BSC 
under the expected distortion metric. One of the transmission schemes is to use a multi-resolution 
source code and a broadcast channel code, with interfaces defined in Table ITVl For the negotiation 
interface, the channel provides the channel state probability P(s) and the entire broadcast capacity 
region boundary. A point on the boundary is a vector (R s ) se s of achievable rates in each channel 
state for a certain BC channel code. Based on the distortion-rate function D(R S ) of its multi- 
resolution code, the source then chooses the rate vector (R s ) to minimize the expected distortion 

P(s)D(R s ). Without channel state information at the transmitter, the size of the index set 
A4 n! t, i-e. the transmitter interface, is fixed. Each index in M. nyt can be viewed as a block of nR t 
bits. Different from the Shannon capacity definition, each bit is only required to be successfully 
decoded by a subset of channel states, not necessarily all states [5, Defn. 5]. Consequently, the 
receiver can choose different decoders based on CSIR, and the receiver interface M. n . s depends 
on the channel state s. 

Although the above schemes differ from each other in their choice of interfaces, all of them 
retain the main advantage of separation - modularity. For example, under the distortion versus 
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TABLE IV 

Interface under expected distortion metric 



Negotiation 


achievable distortion with multi-resolution source 
code D(R S ), broadcast channel capacity region 
(Rs)ses and corresponding channel state probabil- 
ity P(s) 


Transmitter 


M n , t = {1,2,.-- ,2 nRt } 


Receiver 


M n ,s = {1, 2, • • • , 2 nRs } for channel state s 



outage metric, there is a class of channels which can support rate C q with probability no less than 
(1 — q). As long as C q exceeds the rate distortion function R(D q ), the source can be transmitted 
over any channel within this class and be reconstructed at the destination subject to the distortion 
versus outage constraint ©. The source only need to know C q to decide whether the constraint 
(HI) can be satisfied, and the source code design does not depend on any other channel statistics. 
We can argue similarly for other transmission schemes. For all of them, the encoder/decoder 
can be separated into a source encoder/decoder and a channel encoder/decoder, as illustrated by 
the five-block diagram in Fig. [TJ A channel code can be explicitly identified in this diagram, 
which includes the three blocks in the middle. Note that the channel code might be designed 
for generalized capacity definitions, not necessarily for the Shannon capacity definition. 

In contrast joint source-channel coding is a loose label that encompasses all coding techniques 
where the source and channel coders are not entirely separated. Consider the example of the 
direct transmission of a complex circularly symmetric Gaussian source, which we denote by 
CJ\f(Q,a 2 ), over a Gaussian channel with input power constraint P. The linear encoder X = 
f(V) = \JP/a 2 V cannot be separated into a source encoder and a channel encoder. Therefore 
this direct transmission is an example of joint-source channel coding. 

In Section |V] we also propose two other schemes, namely the systematic coding and the 
quantization error splitting, for transmission of a binary symmetric source over a composite 
BSC. These schemes are applicable because of the specific system setup: the source alphabet 
is the same as the channel input alphabet, and they do not apply if the BSC is replaced by 
some other channels. We view them as joint source-channel coding schemes because they lack 
flexibility and because we cannot identify a three-block channel code as in previous examples. 
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Nevertheless, the interface concept can be extended to joint source-channel coding schemes. The 
interface complexity, together with end-to-end performance, provides two criterions to compare 
various schemes. We defer the details to Section IVl 

IV. Separation Optimality under Distortion versus Outage Metric 

Consider transmission of a finite alphabet stationary ergodic source {V^}^ over a composite 
channel W. In this section we show that the classical Shannon separation theorem can be 
extended to communication systems under the distortion versus outage metric. 

A. Lossless Transmission 

Denote by C q the channel capacity versus outage-g and by H(V) the source entropy rate 

H(V)= lim -H(V h V 2 ,--- ,V n ). 

n^oo Tl 

We first consider the case of lossless transmission, i.e. D = 0. The distortion versus outage-g 
constraint © now simplifies to 



Pr | (V n , V n ) : d(V n , V n ) = no outage} 
p r | V n = V n no outage} -> 1 



as n approaches infinity. 



Theorem 1 For lossless transmission, if H(V) < C q then there exists a sequence of blocklength- 
n source-channel codes that satisfy the outage-q constraint 

lim P G (n) < q, lim Pr { V n = V n no outage) = 1; (8) 

n— *oo n— too I J 

conversely, the existence of source-channel codes that satisfy the above constraints also implies 
H(V) < C q . 

To prove the direct part, we construct a two-stage encoder /„, which involves a source encoder 
f n and a channel encoder /„, and similarly for the decoder <p n . The converse of Theorem Q] 
then guarantees this separate source-channel code essentially achieves optimal performance, i.e. 
performance at least as good as any possible joint coding scheme. The converse of the Shannon 
separation theorem [29, p. 217] is established through Fano's inequality. It is known that Fano's 
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inequality fails to provide a tight lower bound for error probability [3], so here we use information 
density to establish the converse for general channel models. 

Proof: In the following we denote R = H(V) and C = C q to simplify notation. 

Achievability: Fix 5 > 0. Since the stationary ergodic source satisfies asymptotic equipartition 
property (AEP) [29, p. 51], for any < e < 1 and sufficiently large n, there exists a source 
encoder 

f n :V n ^Ue{l,2,--- X {R+6) } 

and a source decoder 

4> n :U e {1,2,- •• ,2"^} — 

such that Pr{y n ^ V n } < e. Here V n is the decoder output of the stand-alone source code. 
By definition of capacity versus outage [5, Defn.3], there exist channel codes with a channel 
encoder 

f n :UE{l,2,---,2 n ^}^X n , 

outage indicator 

I: S -{0,1}, 
and a channel decoder for non-outage states 

0„ : Z n = (Y n , S) — U G {1, 2, • • ■ , 2 n{c ~ 5) } 

such that for sufficiently large n, P G (n) = Pr{/ = 0} < q + e and P e (n) = Pr{U ^ U\I = 1} < e. 
For sufficiently small <5 we have i? + 5 < C — 5, which guarantees the output of the source 
encoder f n always lies in the domain of the channel encoder f n . 

Now we concatenate the source encoder, channel encoder, channel decoder and source decoder 
to form a communication system. We declare an outage for the overall system whenever the 
channel is in outage. For non-outage states, denote by V n the source reconstruction at the output 
of the overall system, given by V n = (j) n ((f) n (Z n )) with Z n the channel output due to the 
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channel input X n = f n ( f n {V n )j . We have P G (n) < q + e and 

V n = V n no outage 



> p r 1 V n = V n , U = U 



I = 1 



= Pt-^U = U I = lj -Pr {V n = V n U = U,I = 1 
> (l-e)(l-e). 

Since e > is arbitrary, ([8]) is proved. 
Converse: Notice that 

Pr{V n = V n } > [1 - P (n) ] • Pr | V n = V n no outage} , 

so the outage-g constraint ([8]) also implies 

lim Pi{V n = V n }>l-q. (9) 

n— KXD 

The constraint © is a weaker condition than © since it does not require the outage event to 
be recognized by the decoder. In the following we prove a stronger version of the converse: 
a source-channel code with encoder f n : V n — > X n and decoder n : Z n = (Y n , S) V n 
that satisfies the constraint © also implies H(V) < C q , whether or not the outage event is 
recognized. 



Fix 7 > 0. For any < e < 7, define the typical set as 



- lOg P V n (V H ) ~R 

n 



= <j v 

For any v n G V n , define 

D{v n ) = {Z n E Z n : <j) n {z n ) = v n } 
as the decoding region for v n and 

B(v n ) - 

Then we have 



< e> . 



(10) 



Z n G Z n : -i xnW n (f n (v n );z n ) <R-2 1 
n 



(11) 



Pr <^ -i X "w4X n ; Z n )<R- 2 7 



n 



J2 Pv^v n )W n (z n \f n (v n )) ■ 1 {z n e B(v n )} 

(v n ,z n ) 

E + E + E) Pvn(v n )W n (z n \f n (v n )), 
r\ r 2 r 3 / 



(12) 
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where 1{-} is the indicator function. In (fT2l) we divide the summation into three regions 

T 1 = {(v n , z n ) : v n <£ A<?\ z n e B{v n )} , 

r 2 = {(v n ,z n ) :v n e A^\z n e B{v n )nD(v n )} , 

r 3 = {(v n ,z n ) -.v n e z n e B{v n ) n D c {v n )} , 

where D c {v n ) is the complement of the decoding region D{v n ). We can bound the summation 
over each region as follows. For the first term, we have 

Y^Pv<v n )W n {z n \f n {v n )) < 1 - P vn {A^} < e (13) 

for sufficiently large n as a result of AEP [29, p. 52]. For the second term, we have 

P V n(v n ) < 2~ n{R - t) < 2- niR -~< ) (14) 

W n {z n \f n {v n )) < 2 n{R - 2 ^P Zn (z n ) (15) 

for any (v n , z n ) G T 2 , where (fT4l) is a property of the typical set A^ (flOl ), and ( TT3T) is obtained 
from (fTTI) and the information density definition ©. The decoding regions of different v n do 
not overlap, and therefore 

J2 p V"{v n )W n {z n \f n {v n )) <^2~ n ^ P Z n(z n ) < 2~ n \ (16) 
r 2 r 2 

For the third term, 

Y,Pv<v n )W n {z n \f n {v n )) 
r 3 

< ^ JFV» («") WC^Cw") | 

= Pr{1/ n ^1/ n }. (17) 
Combining ((T2l -(fT3l. (fTBl-dTTl). we obtain 

Pr{V™ ^ V n } > Pr j^^pT; Z n ) < R - 2 7 | - 2~™ 7 - e. 

Let e — > and n — »■ oo, since the constraint © requires the error probability of the source- 
channel code to be upper bounded by q, we conclude 

lim Pr { -i X n W n (X n ; Z n ) < R - 2 7 1 < q. 

Since 7 > is arbitrary, by definition of C q we must have H(V) = R < C q . 
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B. Lossy Transmission 

For the case of lossy transmission (D > 0), we focus on discrete memoryless sources (DMS) 
{^liSi and recall the definition of a source rate-distortion function as [29, p. 342] 

R(D)= min I(V; V). (18) 

P{V\V):Ed(V,V)<D 

Extensions to sources with memory follow the procedures in [25, Sec. 7.2]. Occasionally we 
also use the notation R(V, D) to specify the source distribution. For discrete memoryless source 
and channel models, it is shown that if R(D) < C then the source can be transmitted over the 
channel subject to an average fidelity criterion 

E{d(V n ,V n )} < D. (19) 

Conversely, if the transmission satisfies the average fidelity criterion, we also conclude R(D) < C 
[20, p. 130]. Next we consider composite channel models and generalized distortion metrics. 

Theorem 2 Denote by R(D q ) the rate-distortion function (fl"8~l) of a discrete i.i.d. source eval- 
uated at distortion level D q . If R(D q ) < C q the source can be transmitted over a composite 
channel subject to the outage constraint §4$ 

lim P G (n) < q, 

n—>oc 

lim Pr { (V n , V n ) : d{V n , V n ) > D q no outage) = 0; 

conversely, the existence of source-channel codes that satisfy the above constraints also implies 
R(D q ) < C q . 

The proof of the direct part of Theorem [2] is similar to that of Theorem CD The new element 
is a change from lossless source coding to lossy source coding. In the rate distortion theory 
for source coding, one often imposes the average fidelity criterion E |d(V n , V n )\ < D, where 
V n is the source reconstruction sequence. The main challenge here is to satisfy the condition 
© which is based on the tail of the distortion distribution rather than on its mean. So for 
source coding, instead of the global average fidelity criterion (fT9l) . we impose the following 
local e-fidelity criterion [20, p. 123] 

Pr {(V n ,V n ) : d(V n ,V n ) < > 1 - e. (20) 



February 26, 2009 



DRAFT 



19 



It is well known that for any S > there exist source codes with rate R < R(D) + 5 which 
satisfy the average fidelity criterion (fl9l) [30, p. 351]. To prove the direct part of Theorem (2[ we 
need a stronger result [20, p. 125]: for any < e < 1 and 5 > 0, there exists source encoder 

/„ : V n ■ -> U e {1,2,- ■■ X [R{D)+5] ) 

and source decoder 

4> n '■ U G { 1, 2, • • • , 2 n[R{D)+s] } -> V n 

such that Pr |d(V n , V n ) < Dj > 1 — e. We can then construct channel codes for capacity versus 
outage-g and concatenate it with the e-fidelity source code to satisfy the outage constraint ©, 
similarly as in Theorem [IJ 



Next we consider the converse of Theorem [2l Similar to the case of lossless transmission, we 
prove a stronger version of the converse which does not require outage events to be recognized 
by the decoder. Notice that 

Pr{(V n ,V n ) : d{V n ,V n ) < £>} 

> [1 - P G (n) ] ■ Pr | d(V n , V n ) < D no outage} , 

so the outage constraint © implies 

lim Pr \{V n , V n ) : d(V n , V n ) < d\ > 1 - q. (21) 

We show the constraint (1271) also implies R(D q ) < C q . 

A brief review of the converse of the Shannon separation theorem [20, p. 130] helps to highlight 
the new challenges here. For transmission of a DMS over a DMC under the average fidelity 
criterion (fT9l) . the converse is established through the following chain of inequalities 

C > -I(X n ;Z n ) (22) 

n 

> -I{V n ;V n ) (23) 
n 

> R(D), (24) 



where (1221) is a result of [29, Lemma 8.9.2], (1231) is from the Markov-chain relationship V n — > 
X n — > Z n — > y n and the data processing inequality [29, Theorem 2.8.1], and (124)) is from the 
convexity of a rate-distortion function [29, p. 350]. 
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We face two problems when trying to extend the previous approach to composite channel 
models. First the capacity versus outage-g is defined through information density instead of 
mutual information, and the data processing inequality does not have a counterpart in terms of 
information density. Hence we need to refine the lower bound of error probability in terms of 
information density following a similar approach in the lossless case. 

Second the rate distortion function (fl"8T) is defined through an average fidelity criterion but 
the source and its reconstruction satisfy the g-fidelity criterion (12TI) . In this regard we consider 
the joint type [29, p. 279] or empirical probability distribution P(V m ,V m ) induced by a pair of 
sequences (v n , v n ), where v n is a strong typical sequence [20, p. 33] and v n is the reconstruction 
sequence satisfying d(v n , v n ) < D. Briefly speaking, by definition of joint type the distribution 
P satisfies the average fidelity criterion Ed(V*, V*) < D. By definition of strong typicality the 
marginal distribution P (V*) is "close" to the true source distribution P(V), so the corresponding 
rate-distortion functions R(V*, D) and R(V, D) are also "close" to each other by continuity. This 
idea is formalized in the next proof, prior to which we must define the notion of a strong typical 
sequence: 

Definition 1 [20, p. 33] For a random variable V with alphabet V and distribution p(v), a 
sequence v n E V n is said to be 5-strongly typical if 
for all a E V with p(a) > 0, 

1 



■N(a\v n ) -p(a) 



n 

. for all a G V with p(a) = 0, N(a\v n ) = 0. 
N(a\v n ) is the number of occurrences of the symbol a in v n . 

The set of such sequences will be denoted by Tjyu> or T^V), or simply Tjy,. Let Vi, 1 < % < n, 
be drawn i.i.d. according to p(v). Following the strong law of large numbers, it is seen that for 
any e > 0, 5 > and sufficiently large n, we have 

By definition of strong typicality, for any sequence v n E Tryi we also have 

P vn (v n ) < 2-^ Hi ^>~ s '\ (25) 
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where 

5' = -5 lo £P( a ) > 0. 

a:p(a)>0 

The upper bound (1231) is an immediate result by noticing that 

\ogP V n(v n )= N(a\v n ) log p(a) 

a:p(a)>0 

and v n E Tnn s implies N(a\v n ) > n \p(a) — 6]. 



The definition of a strong typical sequence can be extended to jointly distributed variables. 

Definition 2 [29, p. 359] A pair of sequences (v n , v n ) G V" x V n is said to be 5-strongly typical 
with respect to the distribution p(v, v) on V x V if 
• for all (a, b) G V x V with p(a, b) > we have 



-N(a,b\v n ,v n ) -p(a,b) 



< 5 



. for all (a, b) e V x V with p(a, b) = 0, N(a, b\v n , v n ) = 0. 
N(a, b\v n , v n ) is the number of occurrences of the pair (a, b) in the pair of sequences (v n ,v n ). 



The set of such sequences will be denoted by T" , or Tg(V, V), or Tg if the variables are 
clear from context. 



Proof of Theorem |2l In the following we denote R = R(D q ), D = D q and C = to simplify 
notation. 

Converse: Consider a source-channel code with encoder f n : V n — > X n and decoder n : 
Z n = (F n , S) — > V n that satisfy the outage constraint (|2T|) . We assume both the encoder and 
the decoder are deterministic. 

Fix 7 > 0. Consider < e < (7/4) and 

< 5 < -= ^ n . 

From (1231) . for any v n £ T^ a the choice of <5 ensures 
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For each «" e V", define 

D(v n ) = {z n G Z n : d{v n An{z n )) < D} 

as the set of channel outputs which are mapped to valid source reconstructions, i.e. those within 
distortion D of the original source sequence v n . We also define 

B(v n ) = jz" G Z n : ix»^(/> n ); z n ) < R - 2 7 | . 

Next we derive an upper bound on the probability of valid pairs of sequences. We have 

Pr {d{V n ,V n ) < 
= J2 Pv4v n )W n (z n \f n (v n )) • 1 {z n G D(v n )} 

(v n ,z n ) 

E + E + E ) Pvn(v n )W n (z n \f n (v n )), (26) 
v ri r 2 r 3 / 

In (1261) we divide the summation into three regions 



ri 


= {(v n ,z n ) 




,z n ED(v n )}, 


r 2 


= {(v n ,z n ) 




,z n G B(v n )nD{v n )} , 


r 3 


= {(v n ,z n ) 




t z n g B°(v n ) nfl(/)} , 



where B°(v n ) is the complement of the region B(v n ). We can bound the summation over each 
region as follows. For sufficiently large n, the first term is bounded by 

Y,Pv<V n )W n {z n \f n {v n )) <l-Py n (ifa,) < 6. (27) 

In the second term, for any (v 11 , z n ) G T 2 we have 

P V n(v n ) < 2-^^ 

W n (z n \f n (v n )) < 2 n ( R - 2 ^P zn (z n ), 

therefore 

Y,Pv<v n )W n {z n \f n {v n )) 
r 2 

< 2-»[^W-«--B+27l^p^(^). (28) 
r 2 
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Notice that, in contrast to the lossless case, the regions D(v n ) are not necessarily disjoint; hence 
the summation in (1281) may count the same sequence z n more than once for every v n G Tjyi 
satisfying d(v n , (p n (z n )) < D. In the following we give an upper bound of this repeated counting. 

For any (v n ,z n ) G T 2 and the corresponding decoder output v n = (p n (z n ), we define a pair 
of random variables (V, V) with joint distribution 

P(a, b) = P v nj n (a, b) = N(a, b\v n , v n )/n, 

where N(a, b\v n , v n ) is the number of occurrences of the pair (a, b) in the pair of sequences 
(v n ,v n ). P is also called the joint type or empirical probability distribution of (v n ,v n ) [29, 
p. 279]. Since for every (a, b) G V x V, there are at most (n + 1) possible values {0, 1, ■ ■ • , n} 
for N(a, b\v n , v n ), the number of different types is upper bounded by (n + l)l v l l v l. 

For every fixed v n , the number of sequences v n G V n with joint type P is upper bounded 
by 2 nH ( v \ y } [20, Lemma 1.2.5]. When ranging over (v n , z n ) G T 2 , we can choose the pair 
of sequences (v™,z%), the corresponding decoder output u" and the pair of induced random 
variables (V*, V m ) that maximizes H(V\V). So the repeated counting for each fixed z n is upper 
bounded by 

( n + l)|VHV| 2 n[H(V*|K)] 

and we continue (1281) to obtain 

Y,Pv<v n )W n (z n \f n {v n )) 
r 2 

< {n + 1) |VM1>I • 2~ n ^- 6 - ^+ 2 7-^(v*|t4)] ^ p zn (^») 

< (n+ 1)I V H^I . 2 -n[^(^)-H(K)+/(V;;l4)- J R+27- e ]_ (29) 

For sufficiently large n we have 

( n + i)|v|.|v| < 2 «. (30) 

Obviously G ^y]^ so for any letter a in the alphabet V we have |Py t (a) — p(a)\ < 5. By 
continuity of the entropy function, 

\H(V)-H(V*)\ <e (31) 

for sufficiently small 5. Since Ed(K,K) = d(u ™, < -D, by definition of rate-distortion 
function /(K; V*) > R(V*,D), where the notation R(V*, D) emphasizes the source distribution 
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is Py t . Furthermore we know the rate-distortion function is continuous with respect to the source 
distribution [20, p. 124], for sufficiently small 5 

R = R{V, D) < R{V„ D) + e< 7(K; V*) + e. (32) 

Combine (|2~9T>-(|3~2~1) and notice that < e < (7/4), we obtain 

J2Pv<v n )W n (z n \f n (v n )) < T n \ (33) 
r 2 



For the third term, 



Y,Pv<v n )W n {z n \f n {v n )) 
r 3 

< ^iV»K)^(£ c (OI/»(« n )) 



= l-?T^i X n W n{X n -Z n ) <i?-2 7 |. (34) 

Since the source-channel code satisfies the outage distortion constraint (|2TT) . from (1271) . (1331) and 
(|34l ), for sufficiently large n 



< Pr \d{V n ,V n ) < L>j 



< e + 2"^ + 1 - Pr !^-i X n Wn (X n ; Z n )<R- 2 7 | . 
Let e — > and n — >• 00, we conclude 

lim Pr { -ix^w- (X n ; Z n ) < R - 2 7 1 < q, 

n~*oo J 

which, by definition of C q , implies R = R(D g ) < C q . 

Note that although Theorem [2] is derived for sources with finite alphabets and bounded 
distortion measures, the result can be generalized to continuous-alphabet sources and unbounded 
distortion measures using the technique of [31, Ch. 7]. 

For our strategy the outage states are recognized by the receiver, which can request a re- 
transmission or simply reconstruct the source symbol by its mean - hence the distortion is the 
variance of the source symbol. If we concatenate the source code in the direct part of Theorem 
CD and [2] with a channel code based on e-capacity [3], the relaxed constraints © and (121) can 
still be satisfied. However, there is a subtle difference. The receiver cannot recognize the outage 
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events in the latter strategy and the reconstruction based on the decoded symbols, possibly in 
error, may lead to large distortions. 

C. Example: Transmission of a Gaussian Source over a Slowly Fading Gaussian Channels 



yn 



CAf(0, a 2 ) 



X n 

Encoder f n ■» Channel ^(7) 



1"' 



Decoder < 



yn 



Fig. 2. Transmission of Gaussian source over slow-fading Gaussian channels 



1 ) Distortion verus Outage Metric: We illustrate the separate source and channel codes 
constructed in Theorem [2] by the following example. As shown in Fig. [2l a Gaussian source 
C7V(0, a 2 ) is transmitted over a Rayleigh slow-fading Gaussian channel with fading distribution 
^(7) = (1/7) e~ 7 ^, where 7 is the average channel power gain. The transmitter has a power 
constraint P. The additive Gaussian noise is i.i.d. and normalized to have unit variance. The 
channel realization is only known to the receiver but not the transmitter. In this example we 
index each channel by the power gain 7, which has the same role as the previous channel index 
s. We consider the case where the source block length is the same as the channel block length, 
i.e. the bandwidth expansion ratio b equals to 1. 

For an outage probability q the corresponding threshold of channel gain is 7 g = —7 log(l — g), 
so in non-outage states the channel can support a rate of 

C q = log(l + P lq ) = log [1 - P7 log(l - q)] . (35) 

The rate distortion function of a complex Gaussian source is given by R(D q ) = log(<7 2 / D q ). 
From Theorem [2] if 

<r 2 /D q < l-P T log(l-g), (36) 

then the outage constraint (|4]) can be satisfied by concatenation of a source code at rate R(D q ) 
and a channel code at rate C q . 

It is well known that the uncoded scheme is optimal for transmission of a Gaussian source 
over a Gaussian channel when the bandwidth expansion ratio 6 = 1 [19], [24]. The optimality 



is in the sense that a linear code X = ^JP/a 2 V can achieve the minimum distortion 



^ - 1TP7- (37) 
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for each channel state 7. It is easily seen that the optimal uncoded scheme also requires (l36l) to 
satisfy the outage distortion constraint. In summary, a separate source-channel coding scheme 
meets the outage constraint <(4]) if R(D q ) < C g ; if R(D q ) > C q then the constraint can never be 
satisfied even for optimal joint source-channel coding. The result can be extended to slow-fading 
Gaussian channels with any fading distribution ^(7). 

2) Expected Distortion Metric: Unlike the distortion versus outage metric, source-channel 
separation does not hold for the expected distortion metric. In the following we analyze the 
expected distortion of optimal uncoded schemes and separate source-channel coding schemes. 

Optimal joint source-channel coding: The uncoded scheme with a direct mapping X = 
y/P/a 2 V can achieve the minimum distortion (1371) for each channel state 7, and hence the 
optimal expected distortion 



Separation scheme with channel code for capacity versus outage: Consider using a channel 
code at rate C q for capacity versus outage and a source code at the same rate. With probability 
q the channel is in outage so the receiver estimates the transmitted source symbols by its mean 
to achieve a distortion of a 2 . With probability (1 — q) the channel can support the rate C q and 
the end-to-end distortion is D q = D(C q ). The overall expected distortion is averaged over the 
non-outage and outage states, i.e. Df(q) = qa 2 + (1 — q)D q . 

The minimum achievable distortion of this strategy is obtained by optimizing Df(q) over 




(38) 




q G (0,1), i.e. 



min D 6 , 

0<q<l 



min qa 2 + 

0<<?<1 



(l-q)a 2 



(39) 



l-P 7 log(l- g )' 



The solution is to use a channel code with outage probability 




(40) 



One might be tempted to think that the channel should optimize its outage capacity, 



C° q = (1 - q)C q = {l-q) log [1 - P 7 log(l - q)} , 



(41) 
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defined as the rate averaged over outage and non-outage states [5], and provide R q * c ) as the 

interface to the source, where q* c is the argument that maximizes (14TT) . In fact the solution 

r e ^(P7)_i 
fc = l-exp| ]5 -— 

with W(z) the Lambert-W function solving z = W(z)e w ( z \ is in general different from q* D in 
(l40l) . It is insufficient for the channel to provide only ) as the interface; instead it should 

provide the entire (q, C q ) curve and let the source choose the optimal operating point on this 
curve to minimize overall expected distortion. 

Separation schemes with broadcast channel code: We have seen in Section III-AI that a com- 
posite channel can be viewed as a broadcast channel with virtual receivers indexed by each 
channel state. A broadcast channel code can be applied to achieve rate R s when channel is in 
state s. Since a Gaussian source is successively refinable [32] we can design a multi-resolution 
source code which, when combined with the broadcast channel code, achieves distortion D(R S ) 
for each channel state s. The overall expected distortion is KsD(Rs). 

We assume a power allocation profile ,0(7) > which satisfies the overall power constraint 
Jo°° Pd)^l = P- I* i s snown m [H] mat me following rate, in unit of nats per channel use, is 
achievable when the channel gain is 7 

J l+ul(u) 

Here 1(7) = J 00 p(u)du is the interference level for channel state 7. The minimum expected 
distortion with a multi-resolution source code and a broadcast channel code is then 

poo 

min / a 2 e" i?(7) p(7)^7- (42) 
p(i) Jo 

The optimization problem (1421) was solved in [16] [33]. The optimal power allocation satisfies 



where 



0, 7 < 7p or 7 > 7, 

1p < 7 < 7, 



n(±- e-^du 



7 e~7/27 



and 7p solves I(tp) = P. The minimum expected distortion is 



D 2 = a 



DM + / ; 
Jo 
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where 

e-i_I n e -(u+y)M (u/^yUu 
DM = 1±L = — . 

( 7 /^)- 1 e (7-7)/27 

In general the optimal power allocation p* c {^) that maximizes the expected capacity J °° 72(7)^(7)^7, 
as determined in [11], is different from p* D ("j) that minimizes the expected distortion (|42l) . 
Therefore the channel should provide the entire capacity region boundary {(i? s ) se< s} as the 
interface. 

In Fig. [3] we plot the expected distortion under the different source-channel coding schemes, 
assuming average channel gain 7 = 1 and source variance a 2 = 1. It is observed that the 
broadcast channel code combined with the multi-resolution source code performs slightly better 
than the channel code for capacity versus outage combined with a single rate source code, but 
there is a large gap between their expected distortion and that of the optimal uncoded scheme. 




°0 5 10 15 20 

Transmit power constraint (dB) 



Fig. 3. Expected distortion for various source-channel coding schemes 



V. Source-Channel Interface under Expected Distortion Metric 

When the end-to-end performance metric is expected distortion, separation schemes are usually 
suboptimal. In Section IIV-CI we showed an example of transmission of a Gaussian source over 
a slow fading Gaussian channel. The uncoded transmission scheme is optimal if the bandwidth 
expansion ratio 6 = 1. With bandwidth compression or expansion (b 7^ 1), various joint source- 
channel coding schemes based on layering and hybrid analog-digital transmission [17]— [19] have 
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been proposed to achieve lower expected distortion than separation schemes. However, even the 
simplest problem of transmitting a Gaussian source over a two-state composite Gaussian channel 
is still open - so far no generally optimal scheme is known. 

For joint coding schemes, the concept of source-channel information exchange through the 
interface still applies. Before transmission starts, in separation schemes the source and channel 
exploit the negotiation interface to agree on a single or a set of encoding rates. In joint coding 
schemes, besides encoding rates, information about other source and channel statistics may be 
exchanged. For example, in hybrid digital-analog coding schemes [19] the channel provides 
the encoding rates for the digital part and the channel bandwidth for the analog part as the 
negotiation interface. 

After transmission starts, although we may not separate the encoder/decoder into a source 
encoder/decoder and a channel encoder/decoder for joint coding schemes, we can still identify 
a source processing unit and a channel processing unit in many cases. At the transmitter side, 
in contrast to that of a source encoder, the output of a source processing unit is not necessarily 
from an index set. For example, in a vector-quantization based joint coding scheme [34], the 
source processing unit provides both the quantization index and residue to the channel processing 
unit through the transmitter interface. Similarly at the receiver side, the channel processing unit 
provides an estimate of the quantization index and a noise-corrupted version of the quantization 
residue to the destination processing unit through the receiver interface. 

This notion of a source/channel processing unit is motivated by real applications where the data 
collection and data transmission occur in geographically dispersed locations. Sensor networks 
are one such example, where sensor nodes obtain some local observations and conduct some 
preliminary processing, and the processed data are then delivered to remote fusion centers for 
long-haul transmission. To some extent this notion of source/channel processing unit is a natural 
extension of source/channel encoder/decoder since it also follows the philosophy of design by 
module; however, the flexibility of separation is not retained - many schemes are tailored to the 
specific system and are not universally applicable if the source or channel is changed to other 
models. 

Various source-channel coding schemes, separate or joint, can be compared by their end-to- 
end expected distortions. The benefit of many joint coding schemes comes at a price of more 
information exchange through the interface. We believe a complete picture should represent 
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each scheme by a point on a two-dimensional plot, which shows both end-to-end performance 
and interface complexity. The choice of the transmission scheme then depends on the system 
designer's view of the tradeoff between the two criterions. We illustrate this methodology through 
the following example. 

Consider transmission of a binary symmetric source over a two-state composite BSC. Denote 
by ctj, i — 1, 2, the random crossover probability for each channel state. The two channel states 
occur with probability (1 — p) and p, respectively. We assume n source bits are transmitted over 
m channel uses and m > n, i.e. the channel bandwidth expansion ratio b = m/n > 1. We 
also assume < ct\ < a 2 < (1/2) an d b[l — h(o>i)] < 1, so even the "good" channel state 1 
cannot achieve lossless transmission. The distortion measure between a source sequence and its 
reconstruction is the Hamming distance 



1 n 

d{V n ,V n ) = -Y j V i ®V i . 

n A — ' 
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A. Separate Source-Channel Coding 
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Fig. 4. Separate coding scheme. MR source code with BC channel code. 
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The two states of the composite BSC have a degraded relationship and can be viewed as two 
virtual receivers of a BSC-BC. The following rate pairs, in unit of bits per channel use, are 
achievable using a broadcast channel code [29, p. 425] 

Ri < h(ai * (3) — h(ai), 

R 2 < l-h(a 2 *P), (43) 

where a* (3 = a(l — (3) + (3(1 — a), and h(a) = —a log a — (1 — a) log(l — a) is the binary 
entropy function. The subscript (-) 2 denotes the common information that can be decoded in 
both states, and the subscript (-)i denotes the individual information that is decodable only in 
the good state. By varying (3 between and 1/2 we can trace the entire BC capacity region 
boundary. 

Since a binary symmetric source is successively refinable under the Hamming distortion 
measure [32], we can match the BC code with a multi-resolution source code to achieve 
distortions 

D x = D(b(R 1 + R 2 )), 

D 2 = D(bR 2 ) (44) 

for each state, where b is the bandwidth expansion ratio and D(R) is the distortion-rate function 
of a BSS, i.e. the inverse function of R(D) = 1 — h(D). The overall expected distortion is given 
by 

D e BC = (l-p)D 1 +pD 2 . 

In Fig. |4] we show the block diagram of this separate source-channel coding scheme. The broad- 
cast channel code has a structure of additive superposition encoding and successive decoding 
with interference cancellation [29, p. 379]. The multi-resolution source code is implemented as 
a multistage vector quantization (MSVQ) [35]. Using the test channel interpretation of rate- 
distortion theory [29, p. 343], we see that in the first stage, Source ENC 2 quantizes the source 
sequence V n by V 2 n and the residue Q 2 = V n © ^2 ls a Bernoulli (D 2 ) sequence. In the second 
stage, Source ENC 1 further quantizes Q\ by V™ and the residue Qi=Q 2 ® V\ follows a 
Bernoulli (Di) distribution. Details about the structure of the MR source code and BC code are 
given in Appendix HI 
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TABLE V 

Interface for separation scheme: Multi-resolution source code and broadcast channel code 



Negotiation 


achievable distortion with MR source code 
(D\,D2), BC capacity region (i?i,i?2), channel 
state probability p 


Transmitter 


M m ,t = {l,--- ,2 mfl1 } x {1,... ,2 mR *} 


Receiver 


Mm.,i = Mm.t for channel state 1, M m ,2 = 
{1, 2, • • • , 2 mR2 } for channel state 2. 



The interface of this scheme is summarized in Table |Vl i.e. Table [IV] specified to the current 
example. In Fig. [4] the dashed lines clearly separate the source and channel coders and identify 
the transmitter and receiver interface. To measure the interface complexity, we consider the 
number of bits per source symbol that are delivered through the interface. The complexity of 
the transmitter interface is 

K t BC = b(R 1 + R 2 ), 

and the receiver interface complexity is the expected capacity multiplied by the bandwidth 
expansion ratio 

K c = b\(l-p)R l + R 2 }. 

The separation scheme based on Shannon capacity is a special case when (3 = 0. As a 
result, R 2 = 1 — h(a 2 ) and R\ = 0. We only transmit the base layer information and achieve 
distortion D\ = D 2 = D{bR 2 ) in both states. The transmitter and receiver interface complexity 

is ^Shannon = ^Shannon = h &2 bitS per channel USC 

Similarly, when (3 = 1/2 we have the separation scheme based on capacity versus outage. 
Here R 1 = 1 — h(o>i) and R 2 = 0. We only transmit the refinement layer and achieve distortion 
D\ = D(bRi), D 2 = (1/2). The transmitter interface complexity is Outage = bRi, and the 
receiver interface complexity is -R'outage = ( i— p)bR\, which is proportional to the outage capacity. 

B. Systematic Coding 

Recall that n source bits are transmitted in m channel uses and we assume m > n. The 
channel is divided into a primary channel and a secondary channel. The uncoded n source 
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Fig. 5. Systematic coding scheme. 



bits are directly transmitted over the secondary channel in n channel uses. The output of the 
secondary channel provides side information about the source sequence at the destination. We 
then apply the Wyner-Ziv code [36], which is a source coding technique with side information at 
the decoder, and transmit the encoder output over the primary channel in the remaining (m — n) 
channel uses. The name systematic coding comes from its similarity to the systematic linear 
block code [37, p. 85], where the input information bits are embedded in the output codewords. 
This scheme is motivated by [17]. 

The rate-distortion function for Wyner-Ziv coding with side information is given by [36] 

{9(d), 0<d<d c , 
g{d c ) t = ~9 («c)(a - d), d c < d < a, 
a — a c 

where a is the BSC crossover probability, the function g{d) is defined as 

9(d) = 



h(a * d) — h(d), < d < a, 
0, d = a, 



g'(d) is the derivative of g(d), and the turning point d c is the solution to 

g(d c ) 



= 9'(d c ). (46) 
d c — a 
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We give a brief review of the achievability of the rate-distortion function R*(d). Notice that 
R*(a) = is achievable by simply observing the side information, i.e. the secondary channel 
output due to the uncoded source bits. We focus on the case of < d < d c . For d c < d < a, R*(d) 
is achievable by time sharing between (a, 0) and (d c , R*(d c )). Basically, for a source sequence 
V n drawn i.i.d. from a Bernoulli (1/2) distribution, the output of the secondary channel is 

v: = v n @Q n a , 

where the channel noise Q„ is an i.i.d. Bernoulli(ct) sequence. The Wyner-Ziv codebook C 
consists of 2 n ^~ h ^ codewords V n , drawn i.i.d. from a Bernoulli(l/2) distribution. We can 
approximate each source sequence V n by a quantized version V n with residue Q 1 }, i.e. 

v n = v n © Q2. 

Using the test channel concept of rate-distortion theory [29, p. 343], is an i.i.d. BernouHi(<i) 
sequence independent of V n . We want to recover V n at the destination in order to estimate the 
source sequence V n within distortion d. Without side information, we have to transmit the index 
of each V n using log \C\ = n[l — h(d)] bits. On the other hand, the secondary channel output 

V£ = V n © Ql = V n © Q n d © Ql 

also provides information about V n in terms of I(V™; V n ) = n[l — h(a * d)]. Using the random 
binning technique [29, p.411], we can uniformly distribute the V n sequences into 

nn\X-h{d)\ 
_f _ 2n[h(a*d)-h(d)} 

2^[1— h(a*d)] 

bins, transmit the bin index j(V n ) instead of the sequence index, and hence reduce the encoding 
rate from 1 — h(d) to h(a * d) — h(d). With receiver side information the sequence V n can still 
be decoded with small error. This approach is formalized in [36, Sec. II]. 

The Wyner-Ziv coding rate depends on the quality of the side information, i.e. the BSC 
crossover probability a. We can construct two systematic codes, one for each channel state 
a — ct!j, i — 1,2. For the systematic code targeting the good channel state, if the channel is 
indeed in the good state, we can decode the Wyner-Ziv code with side information V™ and the 
achievable distortion is determined by 

R* 1 {D l ) = {b-l)C 1 = {b-l)[l-h{a 1 )i 
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where C\ is the channel capacity for good state, Rl(d) is the rate-distortion function (|45T) with 
a = a.\. Note that this information is transmitted over the primary channel with bandwidth 
expansion ratio (b — 1), since it only consists of m — n = (b — l)n channel uses. If the channel is 
actually in the bad state, we cannot decode the Wyner-Ziv code. Instead we estimate the source 
by the secondary channel output and achieve a distortion D 2 = ci- 
table VI 

Interface for systematic coding scheme targeting the good channel state 



Negotiation 


Wyner-Ziv rate-distortion function Rl(d), primary 
channel capacity G\, secondary channel statistics 
(n uses of BSC), channel state probability p 


Transmitter 


uncoded source sequence V n , Wyner-Ziv encoder 

output M m - n , t = {1, 2, ■ ■ ■ , 2( m ""> c i } 


Receiver 


secondary channel output V" for both states, 
■Mm-n.i — Mm-n,t for channel state 1 only. 



The interfaces of this scheme are summarized in Table [VO and are also illustrated by the 
dashed lines in Fig. [5] The interfaces divide the source and the channel processing units so that 
we can still design by module, but these processing units are no longer categorized as source 
or channel coders because of the uncoded transmission over the secondary channel. Similar 
to previous separation schemes, we measure the interface complexity by the number of bits 
per source symbol that are delivered through the interface. The complexity of the transmitter 
interface is 

^sy S ,i = i + (&-i)[i-M«i)L 

and the complexity of the receiver interface is 

^SYS,1 = 1 + (1-P)(&-I)[l-M«l)]. 

Similarly we can construct a systematic code targeting the bad channel state. If the channel 
is indeed in state 2, the achievable distortion D 2 is determined by 

R*(D 2 ) = (b-l)[l-h(a 2 )}, (47) 

where R 2 (d) is the rate-distortion function (1451) with a = a 2 . If the channel is in the good state, 
we have different options: 
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• D 2 < d c2 , where d c2 is the turning point given by (1461) . Here the source code does not 
involve any time-sharing. The quality of the side information is actually better than targeted 
so we can also perform Wyner-Ziv decoding, recover V n , and reconstruct the source within 
distortion D 2 . Or we can simply observe the secondary channel output and achieve a 
distortion of ot.\. Therefore D\ = min{D 2 , ai}. 

• D 2 > d c2 . Here the source code involves a time sharing between the uncoded transmission 
and the Wyner-Ziv code with distortion d c2 . The time sharing factor 9 is determined by 

D 2 = 6d c2 + (1 - 6)a 2 . 

In the good state, for proportion (1 — 9) of the time, we use the secondary channel output 
and achieve a distortion of a±. For proportion 9 of the time, we can reconstruct the source 
from the Wyner-Ziv code or the secondary channel output, and achieve a distortion of 
min{(i C 2, «i}. The overall distortion after time-sharing becomes 

Di = 9 mm{d c2 , a±} + (1 — 9)a\. 

The above two cases can be combined as follows 

di, on < min{D 2 , d c2 }, 



Di = < D 2 , D 2 < d c2 ,D 2 < a u 

9d c2 + (1 - 9)ai, d c2 < D 2 , d c2 < ot\. 
The complexity of the transmitter interface is 

Kl YS2 = l + (b-l)[l-h(a 2 )l 

and the complexity of the receiver interface is 

J 1 +p(b - 1)[1 - h(a 2 )}, ai < min{D 2 ,d c2 }, 

SYS 2 — i 

1 + (b- 1)[1 - h(a 2 )], ai > min{A>,42}, 
i.e. for the good channel state we perform Wyner-Ziv decoding if and only if «i > min{_D 2 , d c2 ] 

C. Quantization Residue Splitting 

The block diagram of this coding scheme is shown in Fig. [6] The overall channel is divided 
into two subchannels, a secondary channel of pn channel uses, < p < 1, and a primary channel 
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Fig. 6. Quantization residue splitting scheme. 



of the remaining m — pn = {b — p)n channel uses. For the primary channel, we use the same 
BC code as in Section IV-AI to achieve the rate pair (l43l) 

Ri < h(ai * (3) — h(ai), 
R 2 < l-h(a 2 *P). 

Similar to the MR code in Section IV-Al we first quantize the source sequence V n at rate (b—p)R 2 . 
Note that the bandwidth expansion ratio for the primary channel is (b — p). The quantization 
output V 2 is to be decoded in both channel states. The quantization residue Q 2 = V n © V 2 
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follows a Bernoulli^) distribution with 

d 2 = D((b-p)R 2 ). 

We then split the residue into two sequences, Q^ 1 of the first (1 — p)n bits, and Q^~ p ^ n+1 ' n 
of the remaining pn bits. The sequence Q^ - ^ 1 is quantized at rate j^R\. The output v} 1 li)n 
is to be decoded by channel state 1 only, and it is superimposed over the first-stage quantization 
output V 2 n and transmitted over the primary channel using the previous BC code. The sequence 
Q2 p ' n+1 ' n is directly transmitted over the secondary channel, and the channel output is 

grn— /m+l:m q(1— p)n+l:n ^ Q(i_p) n -|_i :n 

where Qa +1 ' n , a = a>i, i = 1,2 is the channel noise for each state. The separation scheme 
in Section IV-AI can be viewed as the special case of p = 0. Extension to the current residue 
splitting scheme is motivated by [19]. 

In the good channel state, the first (1 — p)n bits are reconstructed by decoding both layers, 
i.e. 



The achievable distortion is 



d 1 = D ( h —^Ri + (b- p)R 2 
\l-p 

The remaining pn bits can be reconstructed by either the first layer only, i.e. Y^P^ n+1 - n ? ^ 
achieve a distortion of d 2 , or further combined with the secondary channel output, i.e. 

— "^/( 1- P) n + 1:n gm—pn+l-.m 

_ y(l-p)n+l:n ^ q(1- p)n+l:n ^ Q(l-p)n+l:n 

to achieve a distortion of ol\. The overall achievable distortion for the good state is 

Di — (1 — p)di + pmm{d 2 , cci}. 

In the bad channel state, we cannot decode the refinement layer and the reconstruction by the 
base layer only achieves a distortion of d 2 . However, for the last pn bits, we can also combine 
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the base layer decoding output with the secondary channel output to achieve a distortion of a 2 . 
Therefore the overall achievable distortion for the bad state is 

D 2 = (1 - p)d 2 + pmm{d 2 , a 2 }- 



The interfaces of this scheme is summarized in Table IVIII and illustrated by the dashed lines 
in Fig. [6l The complexity of the transmitter interface, measured as the number of bits per source 
symbol delivered through the interface, is equal to 

K t QRS = (b-p)(R 1 + R 2 )+p, 

where the subscript (-)qrs denotes quantization residue splitting. The complexity of the receiver 
interface is 



K r 

QRS 



(b-p)[(l-p)Ri + R 2 l d 2 <a u 
(b - p) [(1 - p)R 1 + R 2 \ + (1 - p)p, ai <d 2 < a 2 , 
(b- p)[(l-p)R 1 + R 2 ]+p, d 2 >a 2 , 
i.e., for the primary channel the base layer output is delivered in both states and the refinement 
layer only in channel state 1. The secondary channel output is delivered to the destination 
processing unit in state i, if d 2 > ct,j. 

TABLE VII 

Interface for quantization residue splitting scheme 



Negotiation 


rate-distortion pair (di,d,2) for the MR source code, 
primary channel BC capacity region (ii^iJa), sec- 
ondary channel statistics (pn uses of BSC), channel 
state probability p 


Transmitter 


uncoded partial quantization residue sequence 

g(l- P )«+l:n Mm _ pnt = )2 (m- P „)ii 1} x 

|1 . . . 2( m-pn ) i? - 2 } 


Receiver 


M m - P n,i = Mm-pn.t for channel state 1, 
M m - pn ,2 = {1, ■ ■ ■ , 2 {m ~ pn)R2 } for channel state 2, 
secondary channel output z m -P n + 1]m f or channel state 
i if d-2 > cti, i — 1,2. 
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D. Numerical Examples 

We provide some numerical examples to compare different schemes in this section. We assume 
the two states of the composite BSC have crossover probabilities a± = 0.25 and a 2 = 0.45, and 
the bandwidth expansion ratio b = 2. 



Shannon capacity code 
Capacity vs. outage code 
— Broadcast code 
* Systematic code: state 1 
■ Systematic code: state 2 
-■—Residue splitting scheme 




0.45 



0.5 



Fig. 7. Achievable distortion region (D\,D2) for various schemes. 



In Fig. [7] we plot the achievable distortion pair (D l7 D 2 ) for each scheme. For the broadcast 
coding scheme, by varying the auxiliary variable j3 from and 1/2, we change the rate allocation 
between the base layer (R 2 ) and the refinement layer (Ri). The separation schemes using the 
Shannon capacity code and the capacity versus outage code are the special cases of (3 = 
and 1/2, respectively. They are marked by the two end-points of the broadcast distortion region 
boundary. For the quantization residue splitting scheme, we calculate the distortion pairs (Di, D 2 ) 
for different parameters < (3 < 1/2 and < p < 1. The plotted curve is the convex hull of 
all achievable distortion pairs. Note that the broadcast scheme is a special case of the residue 
splitting scheme with p = 0, so the broadcast distortion region lies strictly within the residue 
splitting distortion region. There are two systematic codes, one targeting at each channel state. 
They are represented by two points, both out of the residue splitting distortion region. 

In Fig. [8] we plot the expected distortion of various schemes for different channel state 
distributions. Each systematic code achieves a single distortion pair, so the expected distortion is 
simply the weighted average and increases linearly with the bad channel state probability p. For 
broadcast and residue splitting schemes, we need to choose the optimal point on the distortion 
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Fig. 8. Expected distortion for various channel state distributions. 



region boundary at each channel state probability. Since the broadcast scheme is a special case 
of the residue splitting scheme, its expected distortion is no less, and sometimes strictly larger, 
than that of the residue splitting scheme. For different ranges of p, the scheme that achieves the 
lowest expected distortion is also different. For p < 0.378 or p > 0.956 it is the residue splitting 
scheme, for 0.378 < p < 0.845 it is the systematic code for the good channel state, and for 
0.845 < p < 0.956 it is the systematic code for the bad channel state. 
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Fig. 9. Transmitter interface complexity vs. expected distortion tradeoff. 



Expected distortion alone does not provide the complete picture for comparison of the schemes. 
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Fig. 10. Receiver interface complexity vs. expected distortion tradeoff. 

In Fig. [9] and [10] we assume the channel state probability p = 0.7 and illustrate the tradeoff 
between the expected distortion and the transmitter/receiver interface complexity for different 
schemes, where the complexity is measured by bits per source symbol delivered through the 
interface. For the broadcast scheme, we can reduce the expected distortion by increasing ft, 
which reduces the base layer rate but increases the refinement layer rate and the total rate, hence 
a higher interface complexity. However, the distortion-complexity curve is not strictly decreasing. 
After we reach the minimum expected distortion, it does not provide any more benefit to further 
increase the interface complexity. The same trend is also observed in the residue splitting scheme. 
At channel state probability p — 0.7, the systematic code targeting the good state has the lowest 
expected distortion, nevertheless it also has the highest interface complexity. The choice about 
the appropriate scheme and operating points (parameters) depends on the system designer's view 
about this distortion-complexity tradeoff. 

VI. Conclusions 

We consider transmission of a stationary ergodic source over non-ergodic composite channels 
with channel state information at the receiver (CSIR). To study the source-channel coding 
problem for the entire system, we include a broader class of transmission schemes as separation 
schemes by relaxing the constraint of Shannon separation, i.e. a single-number comparison 
between source coding rate and channel capacity, and introducing the concept of a source- 
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channel interface which allows the source and channel to agree on multiple parameters. 

We show that different end-to-end distortion metrics lead to different conclusions about 
separation optimality, even for the same source and channel models. Specifically, one such gen- 
eralized scheme guarantees the separation optimality under the distortion versus outage metric. 
Separation schemes are in general suboptimal under the expected distortion metric. We study 
the performance enhancement when the source and channel coders exchange more information 
through a more sophisticated interface, and illustrate the tradeoff between interface complexity 
and end-to-end performance through the example of transmission of a binary symmetric source 
over a composite binary symmetric channel. 

Appendix I 

MR Source Code and BC Channel Code Structure 

In Fig. HI the multi-resolution source code can be constructed as follows. Consider three inde- 
pendent auxiliary random variables l / 1 ~Bernoulli(A), V2~Bernoulli(l/2), and C}i~Bernoulli(D 1 ), 
where 

,_D 2 -D 1 

and Di, D 2 are given by (l44l) . Also define 

Q2 = K©Qi, 

which has a Bernoulli distribution with parameter A * D\ = D 2 . These variables are related to 
the source symbol through the relationship 

V = V 2 ®Q 2 = V 2 ®V 1 ®Q 1 . 

Random codebook generation: Generate 2 nbR2 sequences V 2 (w 2 ), w 2 E {1, ■ ■ • ,2 nW?2 }, by 
uniform and independent sampling over the strong typical set T^(V 2 ). Similarly, generate 2 nbRl 
sequences V{ l (wi), W\ E {1, • • • ,2 nbRl }, drawn uniformly and independently over T^(Vi). 

Encoding: Given V n E V n , the encoder searches over (w\, w 2 ) E {1, • • • , 2 nbRl }x{l 1 ■ ■ ■ ,2 nbR2 }. 
If it finds a pair (wi,w 2 ) such that 

(V n , V?{ Wl ), V 2 n (w 2 )) E T^V, Vi, V 2 ), 

it stops the search and sends the above (wi,w 2 ). Otherwise it sends (^1,^2) = (1, !)• 
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Decoding: If only the index w 2 is received, the decoder declares the estimate of the source 
sequence as V 2 = V 2 (w 2 ). If both indices are received, the source is reconstructed as V n = 
V™®V 2 = V™{wi)®V 2 (w 2 ). Following the procedures in [38] and [39, Theorem 1] we can easily 
verify the following distortion targets are achievable: ¥,d(V n , V n ) < Di, Ed(V n , V 2 ) < D 2 . 

In practice the MR source code can be implemented as a multi-stage vector quantization, which 
has an additive successive refinement structure [39]. As shown in Fig. HI in channel state 2 only 
the base layer description is received and Source DEC 2 determines the base reconstruction V 2 n . 
When both layers are received, Source DEC 1 determines a refinement sequence V™ based 
on the refinement layer encoding index only, and add it to the base reconstruction V 2 to 
obtain the overall reconstruction V n . On the contrary, for general MR source codes the overall 
reconstruction may require a joint decoding of indices from both layers. The additive refinement 
structure reduces coding complexity, provides scalability, and does not incur any performance 
loss under certain conditions [39, Theorem 3], which are all satisfied in this example. 

The broadcast channel code design, for a chosen < (3 < (1/2), is summarized as follows. 

Random codebook generation: Generate 2 nbR2 = 2 mR2 independent codewords U m (w2), w 2 G 
{l,--- ,2 mi?2 }, by i.i.d. sampling of a Bernoulli(l/2) distribution. Generate 2 nbRl = 2 rnRl 
independent codewords Q r S'{wi), w\ G {1, • • • ,2 mRl }, by i.i.d. sampling of a Bernoulli^) 
distribution. 

Encoding: To send the index pair (wi,w 2 ), send X m = Q™(wi) © U m (w2). 
Decoding: Given channel output Z m , in state 2 we determine the unique w 2 such that 

d(Z m ,U m {ti 2 )) < {a 2 *(3). 

In state 1 we look for the unique indices (wi,w 2 ) such that 

d(Z m ,U m (w 2 )) < («!*/?), 

d(Z m , Q^iwi) ffi U m (w 2 )) < a v 

Following the analysis of [29, Theorem 14.6.2], we can show that the channel decoding error 
probability approaches zero as long as the encoding rates satisfy (|43~T) . 
Roughly speaking, in channel state 2, we observe 

Z m = X m © Q™ = U m © Q™ © 
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where the channel noise Q™ is a Bernoulli^) sequence. We want to decode the U rn sequence 
subject to the overall interference-plus-noise Q™ © Q™ 2 , which is a Bernoulli sequence with 
parameter (ot 2 * j3), hence the achievable rate 1 — h{cti * 0). In channel state 1, we observe 

Z m = X m gm = rjm @ Q rn @ Q m 

Since a>i < a 2 , the sequence U m can be decoded and then subtracted off. We then decode Q™ 
subject to the noise Q™, and the rate h(ai * (3) — h(aii) is achievable. 
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